-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use BitChunks in equal_bits #2194
Conversation
I took the liberty of fixing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about the change to boolean_equal
-- everything else looks good to me
@@ -37,6 +38,21 @@ use std::sync::Arc; | |||
|
|||
use super::equal::equal; | |||
|
|||
#[inline] | |||
pub(crate) fn contains_nulls( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the never ending confusion about "is this a size in bits or bytes" I recommend making it explicit -- either add a docstring that says offset
and len
are in bits
or perhaps name them bit_offset
and bit_len
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the potential usefulness (and non obviousness) of using BitSliceIterator
to check for nulls, I wonder what you think about making this a function on BitSliceIterator
such as BitSliceIterator::contains_only_unset_bits
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could definitely see this being promoted to a method on BitMap, once that supports slicing #1802
|
||
#[test] | ||
fn test_contains_nulls() { | ||
let buffer: Buffer = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if using a buffer with more than 64 bits is important (or is that already well enough covered in BitSliceIterator
tests)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy they are well covered by BitSliceIterator
|
||
let lhs_start = lhs.offset() + lhs_start; | ||
let rhs_start = rhs.offset() + rhs_start; | ||
|
||
(0..len).all(|i| { | ||
BitIndexIterator::new(lhs_null_bytes, lhs_start, len).all(|i| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? It seems like it will would not find positions where rhs
was null but lhs
was not. Maybe I mis understand something
It seems like we need to iterate over the positions where either is set -- like lhs_null_bytes | rhs_null_bytes
?
Maybe there is a lack of test coverage 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe you can just call equal_bits
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods are purely checking value equality, at this point the null masks have already been checked for equality. Otherwise the previous logic would have been ill-formed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I missed that this was checking BooleanArray
(and hence this get_bit
is getting the value as you say). 👍
|
||
let lhs_start = lhs.offset() + lhs_start; | ||
let rhs_start = rhs.offset() + rhs_start; | ||
|
||
(0..len).all(|i| { | ||
BitIndexIterator::new(lhs_null_bytes, lhs_start, len).all(|i| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I missed that this was checking BooleanArray
(and hence this get_bit
is getting the value as you say). 👍
Benchmark runs are scheduled for baseline = 6c77cd5 and contender = 7199b1b. 7199b1b is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2186
Part of #2188
Rationale for this change
Technically equal_bool_513 has regressed, but this is a fraction of a nanosecond so I think is unlikely to be all that meaningful in practice, and the significant millisecond improvements elsewhere are justification for this change
What changes are included in this PR?
Are there any user-facing changes?